A brain-inspired theory of mind spiking neural network improves multi-agent cooperation and competition
نویسندگان
چکیده
•ToM helps agents infer others’ actions by self-modeling or modeling others•Agents with ToM optimize self-policy after considering future in MARL•We build a brain-inspired SNN model to simulate the function and mechanism of ToM•The exhibits superior performance on cooperative competitive tasks Theory mind (ToM), kind high-level social cognitive ability, enables individuals mental states thus explain predict behavior. plays crucial role human interaction. Currently, multi-agent systems, without make decisions based observations environment, often ignoring impact other agents’ behavior current decisions. Inspired brain, this article designed spiking neural network (SNN) for validated it decision-making tasks. This method effective facilitation competition cooperation among multiple agents. work reveals critical interactions lays foundation development intelligence artificial intelligence. During dynamic interaction, inferring predicting behaviors through theory (ToM) is obtaining benefits Current reinforcement learning (MARL) methods primarily rely agent select behaviors, but they lack inspiration from ToM, which limits performance. In article, we propose (MAToM-DM) model, consists MAToM (MAToM-SNN) module module. We design two modules (Self-MAToM Other-MAToM) self-experience others, respectively. Each can adjust its according predicted others. The effectiveness proposed has been demonstrated experiments conducted results indicate that integrating enhance efficiency lead higher rewards compared traditional MARL models. Higher animals, such as humans, gradually derive relationships, are inseparable ability states, known (ToM). draws information desires beliefs construct representations others given situation.1Sebastian C.L. Fontaine N.M.G. Bird G. Blakemore S.-J. Brito S.A.D. McCrory E.J.P. Viding E. Neural processing associated affective Mind adolescents adults.Soc. Cognit. Affect Neurosci. 2012; 7: 53-63https://doi.org/10.1093/scan/nsr023Crossref PubMed Scopus (301) Google Scholar,2Koster-Hale J. Saxe R. mind: prediction problem.Neuron. 2013; 79: 836-848https://doi.org/10.1016/j.neuron.2013.08.020Abstract Full Text PDF (285) Scholar,3Dennis M. Simic N. Bigler E.D. Abildskov T. Agostino A. Taylor H.G. Rubin K. Vannatta Gerhardt C.A. Stancin Yeates K.O. Cognitive, affective, conative children traumatic brain injury.Dev. Cogn. 5: 25-39https://doi.org/10.1016/j.dcn.2012. 11.006Crossref (0) Scholar If learn behavior, avoid predictable trouble take advantage opportunities. cognition inspired us explore decision-making. An should be able depending then their policies obtain more overall predictions. A simple example using shown scenario Figure 1A, where there three agents: B (Bob), G (Green), O (Oli). needs O. B’s he likes apples. finds oranges his observation (step 1 1A). Since G’s have never seen, infers apples own experience. From O, would oranges. Imitation suggests when an sees exhibiting similar one executed before, similarity evokes empathy observing agent. Thus, observed same intention4Gallese V. Goldman Mirror neurons simulation mind-reading.Trends Sci. 1998; 2: 493-501https://doi.org/10.1016/S1364-6613(98)01262-5Abstract (2097) (e.g., inference about above process simulated hypothetical circuit 1B. beginning, observes environment inferior parietal lobule (IPL) posterior temporal sulcus (pSTS) store self-relevant other-relevant observations, respectively.5Uddin L.Q. Molnar-Szakacs I. Zaidel Iacoboni rTMS right disrupts self–other discrimination.Soc. 2006; 1: 65-71https://doi.org/10.1093/scan/nsl003Crossref (209) Scholar,6Patel G.H. Sestieri C. Corbetta evolution temporoparietal junction sulcus.Cortex. 2019; 118: 38-50https://doi.org/10.1016/j.cortex.2019.01.026Crossref (69) anterior cingulate cortex (ACC) stimulated different expectations.7Shenhav Botvinick M.M. Cohen J.D. expected value control: integrative function.Neuron. 217-240https://doi.org/10.1016/j.neuron.2013.07.007Abstract (1256) Scholar,8Wang F. Peng Bai Y. Li Zhu Sun P. Guo H. Yuan Rotshtein Sui dorsal modulates dialectical self-thinking.Front. Psychol. 2016; 152https://doi.org/10.3389/fpsyg.2016.00152Crossref (6) ventral medial prefrontal (vmPFC) stores related self, PFC (dmPFC) others.9Abu-Akel Shamay-Tsoory S. Neuroanatomical neurochemical bases mind.Neuropsychologia. 2011; 49: 2971-2984https://doi.org/10.1016/j.neuropsychologia.2011.07.012Crossref (400) Based stored information, dorsolateral (dlPFC) others.10Suzuki Harasawa Ueno Gardner J.L. Ichinohe Haruno Cheng Nakahara Learning decisions.Neuron. 74: 1125-1137https://doi.org/10.1016/j.neuron.2012.04.030Abstract (150) Some existing studies focus mechanism-inspired recursive algorithm modeled probabilistic complete rock-paper-scissors game.11De Weerd Verbrugge Verheij B. How much does help know what she knows you know? agent-based study.Artif. Intell. 199-200: 67-92https://doi.org/10.1016/j.artint.2013.05.004Crossref (49) Osten et al. introduced idea opponents.12Von Der F.B. Kirley Miller minds many: opponent stochastic game.in: IJCAI. 2017: 3845-3851Crossref Nguyen used quickly.13Nguyen D. Venkatesh Tran guilt aversion facilitates learning.in: Asian Conference Machine Learning. PMLR, 2020Google Baker Bayesian inference.14Baker Tenenbaum joint belief-desire attribution.Proceedings annual meeting science society. 33: 2469-2474Google Scholar,15Baker Jara-Ettinger J.B. Rational quantitative attribution beliefs, percepts mentalizing.Nat. Human Behav. 2017; 0064https://doi.org/10.1038/s41562-017-0064Crossref (210) our past work, drawn circuits mechanisms models incorporated multi-scale plasticity coordinated areas.16Zeng Zhao Zhang Lu mind.Front. Neurorob. 2020; 14: 60https://doi.org/10.3389/fnbot.2020.00060Crossref (8) Scholar,17Zhao Z. Zeng reducing safety risks agents.Front. 2022; 16: 753900https://doi.org/10.3389/fnins.2022.753900Crossref (1) These algorithms were separately applied pass false belief task (AI) experiment. More complex significantly challenge them. Rabinowitz trajectories behavioral data constructed goals.18Rabinowitz Perbet Song Eslami S.A. mind.in: International conference machine learning. 2018: 4218-4227Google limited not use predictions collaborate better. Wang al.’s goals jointly helped choose led better collaboration communication tasks.19Wang Zhong Xu Tom2c: target-oriented mind.arXiv. 2021; (Preprint at)https://doi.org/10.48550/arXiv.2111.09189Crossref addition, vast majority improvements deep (RL) methods, centralized distributed decision-making, while lacking references ToM.20Tampuu Matiisen Kodelja Kuzovkin Korjus Aru Vicente Multiagent learning.PLoS One. 12e0172395https://doi.org/10.1371/journal.pone. 0172395Crossref Scholar,21Sunehag Lever Gruslys Czarnecki W.M. Zambaldi Jaderberg Lanctot Sonnerat Leibo J.Z. Tuyls Graepel Value-decomposition networks team reward.in: Proceedings 17th Autonomous Agents MultiAgent Systems. AAMAS ’18 Richland, SC. Foundation Systems, 2085-2087Google Scholar,22Lowe WU Tamar Harb Pieter Abbeel Mordatch Multi-agent actor-critic mixed cooperative-competitive environments.in: Advances Information Processing 30. Curran Associates, Inc., 6379-6390Google general, still need draw deeply improve collaborative Considering various limitations mentioned above, aims MARL. SNNs23Maass W. Networks neurons: third generation models.Neural Network. 1997; 10: 1659-1671https://doi.org/10.1016/S0893-6080(97) 00011-7Crossref Scholar,24Ghosh-Dastidar Adeli Spiking networks.Int. Syst. 2009; 19: 295-308https://doi.org/10.1142/S0129065709002002Crossref (544) Scholar,25Khalil Moftah M.Z. Moustafa A.A. effects dynamical synapses firing rate activity: model.Eur. 46: 2445-2470https://doi.org/10.1111/ejn.13712Crossref (12) Scholar,26Zeng Shen Dong Q. Liang al.Braincog: engine ai simulation.arXiv. at)https://doi.org/10.48550/arXiv.2207.08533Crossref advantages simulating brain’s structure function, extracting spatiotemporal properties, so on. importantly, SNNs biologically plausible, energy efficient, naturally suitable functions brain.23Maass transmitted non-continuous binary spikes. difficult backpropagation methods. Spike-timing-dependent (STDP)-based approaches17Zhao Scholar,27Vasquez Tieck J.C. Becker Kaiser Peric Akl Reichard Roennau Dillmann target reaching motions robotic arm dopamine modulated STDP.in: 2019 IEEE 18th Cognitive Informatics & Computing (ICCI∗CC). 2019: 54-61Crossref (7) Scholar,28Zhao Han Fang Nature-inspired self-organizing collision avoidance drone swarm reward-modulated network.Patterns. 3100611https://doi.org/10.1016/j.patter.2022.100611Abstract characteristics synaptic solve control competent works17Zhao Scholar,29Izhikevich E.M. Solving distal reward problem linkage stdp signaling.Cerebr. Cortex. 2007; 17: 2443-2452https://doi.org/10.1093/cercor/bhl152Crossref (483) Scholar,30Frémaux Gerstner Neuromodulated spike-timing-dependent plasticity, three-factor rules.Front. Circ. 9: 85https://doi.org/10.3389/fncir.2015.00085Crossref (194) Scholar,31Sanda Skorheim Bazhenov Multi-layer utilizing rewarded spike time dependent foraging task.PLoS Comput. Biol. 13e1005705https://doi.org/10.1371/journal.pcbi.1005705Crossref Scholar,32Zhao application unmanned aerial vehicle.Front. 2018; 12: 56https://doi.org/10.3389/fnbot.2018.00056Crossref (22) Scholar,33Zhao Su drosophila linear nonlinear decision-makinge.Sci. Rep. 1018660https://doi.org/10.1038/s41598-020-75628-yCrossref adopted STDP RL problems. However, these cannot end-to-end optimization SNN-based conversion (ANNs) into SNNs34Patel Hazan Saunders D.J. Siegelmann H.T. Kozma Improved robustness upon neuronal platforms atari breakout game.Neural 120: 108-115https://doi.org/10.1016/j.neunet.2019.08.009Crossref (32) Scholar,35Tan Patel Strategy benchmark converting q-networks event-driven networks.in: AAAI 35. 2021: 9816-9824Crossref surrogate gradients directly training SNNs36Sun feature vanishing Q potential normalization.arXiv. at)https://doi.org/10.48550/arXiv.2206.03654Crossref feasible SNNs. hybrid framework uses actor ANNs critic perform well some tasks.37Tang Kumar Yoo Michmizos Deep population-coded continuous control.in: Kober Ramos Tomlin 2020 Robot vol. 155 Research. 2016-2029Google self-organized vehicle (UAV) obstacle avoidance,28Zhao extend environments lacks ToM. Little research implemented SNNs: Saravanan al.38Saravanan P.S. Dey Gaddamidi A.R. Exploring single rl methods.in: 2021 Rebooting (ICRC). IEEE, 88-98Crossref enable reduce consumption completing tasks, Ye al.39Ye P.-G. Xiao R.-F. Z.-Y. K.-X. mean field resources allocation d2d 20th Ubiquitous Communications (IUCC/CIT/DSCI/SmartCNS). 60-67Crossref combine mean-field approximate device-to-device users convergence rate. former explores feasibility MARL, latter solves practical specific domain. contrast, significance combination energy-efficient cognition, paper proposes MAToM-SNN model. core gain rewards. Motivated this, main contributions summarized follows.(1)MAToM-DM integrates output MAToM-SNN, integrated individual collective achieve efficient collaboration.(2)MAToM-SNN networks, integrate gradients. incorporates Self-MAToM Other-MAToM behaviors.(3)MAToM-DM (stag hunt game) (multi-agent particle environment) Experimental demonstrate introduction group (compared IQL value-decomposition [VDN] recurrent [RNNs] MADDPG RNNs environments). MAToM-DM, each predicts team’s reward. contains module, 2. inputs one’s multi-layer characterize attributions MAToM-DM concatenates obtained optimal policy. ways, Other-MAToM, side trained self-experience. Nevertheless, historical train compares module) (Other-MAToM interaction between could accomplish competition. verified stag tasks40Nesterov-Rappoport D.L. trust: Understanding prosocial systems.in: Tech. Drew University Madison, NJ2022https://doi.org/10.1098/rsif.2020.0491Google discrete space environment22Lowe Scholar,41Mordatch Emergence grounded compositional language populations.Proceedings Artificial Intelligence. 32. 2018https://doi.org/10.1609/aaai.v32i1.11492Crossref time. specifically apply neuron gradient (BrainCog)26Zeng implementing work. evaluate act cooperatively independently less Because learns policy, experiences self same. do distinguish Other-MAToM. 5 ∗ grid. All physical actions, including left, up, down, right, stay. experiment, one-dimensional vectors containing agent’s position, positions, stag’s plants’ positions. detailed depicted 3 follows. randomly maturing plants. When top plant, harvests plant. young will receive reward, 1, nothing. mature both stag. At first, moving. start moving walk together Only keep chasing 1. Otherwise, game end. agents, plants, harvest plant Two 5. get close else injured penalty −5. VDN21Sunehag method. network, batch size 250. length episode 50 steps. 0.99. task, how last learned chase together. To validate compare baselines, DQN,42Mnih Kavukcuoglu Silver Rusu Veness Bellemare M.G. Graves Riedmiller Fidjeland A.K. Ostrovski al.Human-level learning.Nature. 2015; 518: 529-533https://doi.org/10.1038/nature14236Crossref (14905) IQL,20Tampuu RNN-based VDN (RVDN),21Sunehag (SVDN). comparative baselines Table MAToM-RVDN MAToM-SVDN imply combined RVDN SVDN, experimental show SVDN outperform DQN VDNs structures (such SVDN) differently total improved combining MAToM-SNN. As (MAToM-RVDN line line), most cases, especially added (MAToM-RVDN), speed dramatically improved. guarantee almost no loss even Similarly, also improves achieved re
منابع مشابه
TempUnit: A Bio-Inspired Spiking Neural Network
Formal neural networks have many applications. Applications of control of tasks (motor control) as well as speech generation have a certain number of common constraints. We are going to see seven main constraints that a system based on a neural network should follow in order to be able to produce that kind of control. Afterwards we will present the TempUnit model which is able to give some answ...
متن کاملDynamic Cooperation and Competition in a Network of Spiking Neurons
We discuss recurrent networks with local excitatory and surrounding inhibitory connectivity and their implications as model for the dynamics of sensory awareness and oculomotor programming. A spiking version of such a model and its association with simulations of the superior colliculus are reviewed. We also discuss the competition for attention within this model as put forward by Taylor, and p...
متن کاملRunning head: THEORY OF MIND FOR COOPERATION AND COMPETITION 1 Distinct neural patterns of social cognition for cooperation versus competition
How do people consider other minds during cooperation versus competition? Some accounts predict that theory of mind (ToM) is recruited more for cooperation versus competition or competition versus cooperation, whereas other accounts predict similar recruitment across these two contexts. The present fMRI study examined activity in brain regions for ToM (bilateral temporoparietal junction, precun...
متن کاملMulti-objective Modeling Based on Competition Airlines Cooperation by Game Theory and Sustainable Development Approach
In each time period, the demand of passengers for each route are finite and airlines compete for earning more profits. The complex competition among airlines causes problems, such as complicating flight planning and increasing empty seats for some routes. These problems increase air pollution and fuel consumption. To solve these problems, this research studies the cooperation of the airlines wi...
متن کاملCooperation Competition Running head: A THEORY OF COOPERATION - COMPETITION A Theory of Cooperation - Competition and Beyond
This chapter is concerned with my interrelated theoretical work in the areas of cooperation competition, conflict resolution, social justice, and social relations. The theory of cooperation competition is a component of the other theories. Thus, the theory of conflict resolution is based on this theory and my Crude Law of Social Relations. My work in social justice is also based on this theory,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Patterns
سال: 2023
ISSN: ['2666-3899']
DOI: https://doi.org/10.1016/j.patter.2023.100775